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DESIGNING DEGENERATE PCR PRIMERS 

Field of the invention 

The invention relates to a melhod of designing a paael of primers for 
detecting viruses in a high throughput polymerase chain reaction assay. 

Background of the invention 
5 All organisms appear to be capable of infection by viruses, including bacteria, 

animals and plants. Vimses require the use of the cellular- translation and 
transcription machinery to replicate. In the process of replication they often have 
deleterious effects on the host cell and thus on the host organism. Viruses constitute 
an important class of pathogens causing many diseases, leading to loss of life in 
10 humans and economic loss in the agricultural industries. 

Summary of the invention 

The poljonerase chain reaction (PCR) allows the amplification of a specific 
region of a polynucleotide. The specificity of the reaction is due to the primers 

1 5 which during the course of PCR bind to the region to be amplified in a sequence 
specific manner. The invention provides a method of designing primers which can 
be used in high throughput screening to detect viruses. The method may be used to 
detect unknown viruses which have not yet been sequenced. 

In particular the invention provides a method of designing a panel of 

20 degenerate primer pairs for screening for new members of multiple known virus 
families in a biological sample, wherein each primer pair in the panel binds a 
sequence that is conserved across members of a said virus family and selectively 
directs amplification of sequence of said family by PCR, which method comprises 

(a) providing a plurality of amino acid sequences from members of a first 
25 virus family, 

(b) comparing the sequences to identify coijserved regions, 

(c) designing a first primer pair using a computer based method, wherein each 
primer in the pair binds a nucleotide sequence that encodes a conserved region 
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identified in (b) and wherein the primer pair is designed to amplify by PGR the 
nucleotide sequence between the nucleotide sequences that encode conserved regions 
in members of the first virus family, and 

(d) repeating steps (a) to (c) for each vims family. 
5 The invention also provides a method of designing a panel of degenerate 

primer pairs for screening for new members of multiple known vims families in a 
biological sample, wherein each primer pair in the panel binds a sequence that is 
conserved across members of a said virus family and selectively directs amplification 
of sequence of said family by PGR, which method comprises 
10 (a) providing a plurality of nucleotide sequences from members of a first 

virus family, 

(b) comparing the sequences to identify conserved regions, 

(c) designing a first primer pair usiag a computer based method, wherein each 
primer hi the pair binds a conserved region identified in (b) and whereia the primer 

15 pair is designed to amplify by PGR the nucleotide sequence between the conserved 
regions in members of the first virus family, and 

(d) repeating steps (a) to (c) for each vims family. 

The invention additionally provides a panel of primers which has been 
designed by the method of the iQvention. 
20 . . 

Detailed description of the invention 

The invention provides a method of designing a panel of primer pairs which 
can be used in high throughput virus screening. The method comprises initial steps 
which deduce the sequences of the primers using computer based calculations, and 
25 optional later steps in which the primers are synthesised and tested empirically, for 
example to determine optimal PGR conditions and/or to select primer pairs with 
desired further properties. 

The panel of primers provided by the method are designed to be capable of 
detecting unknown viruses based on nucleotide and/or amino acid sequences in the 
30 unknown virus which are similar (homologous) to nucleotide and/or amino acid 
sequences in a known virus. These conserved sequences typically have a role in 
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providing a necessary or advantageous activity or property to the virus. Conserved 
nucleotide sequences may be coding or non-coding sequences. 

In one embodinaent the conserved sequences code for or are from virus 
proteins which have the following activities: DNA or RNA polymerase (replicase), 
5 topoisomerase (helicase/gyrase), endonuclease (integrase), nucleic acid binding 
protein, protease^ transcription factors, envelope glycoproteins, structural protein 
(e.g. capsid or nucleocapsid protein). 

The panel of primers is designed to detect viruses which are single stranded 
or double stranded DNA or single stranded or double straaded RNA viruses. The 

10 viruses are generally capable of infecting prokaryotic or eukaryotic cells, such as 
bacterial, animal, plant, yeast or fungal cells. Preferably the viruses are mammalian 
(preferably primate) or avian viruses, such as human, pig, horse, sheep, goat, cow, 
chicken, turkey or duck viruses. 

The viruses are typically from any combination of the following families: 

15 Adenoviridae, Arenaviridae, Arteriviridae, Astroviridae, Bimaviridae, Bunyaviridae, 
Caliciviridae, Circoviridae, Coronaviridae, Deltavirus, Filoviridae, Flaviviridae, 
Hepadnaviridae, Herpesviridae, Orthomyxoviridae, Papovaviridae, Paramyxoviridae, 
Parvoviridae, Picornaviridae, Polydnaviridae, Poxviridae, Reoviridae, Retroviridae, 
Rhabdoviridae, Togaviridae or Bomavirus. 

20 The primers of the panel are capable of detecting imknown virases in a 

biological sample. Such a sample either originates from a single individual or is a 
pooled sample from individuals of the same species. Thus the panel of primers 
detects viruses which infect the same species (from which the sample originates). 

Generally in the method at least 15, 30, 50, 100, 200 or more, typically up to 

25 a maximum of 300 different primer pairs are designed. The primer pairs designed in 
the method bind sequence which is conserved across members of a virus family. The 
panel which is designed in the method may comprise primer pairs that bind sequence 
which is conserved across substantially all members of the family or across a subset 
of the members of the family, for example across all members of a subfamily or of a 

30 genus. Generally, the primer pairs bind at least 70%, at least 80%, or at least 90% of 
the known viruses of the family, subfamily or genus. Preferably less than 10, such as 
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less than 5, primer pairs will be used for the detection of any given family, subfamily 
or genus in the panel. 

The panel of primer pairs is generally capable of detecting viruses from at 
least 10, 15, 20, 30 or more families, typically up to a maximum of 35 families, 
5 The panel of primer pairs may comprise sets of primer pairs which perform a 

nested PGR reaction. Generally such a set of primer pairs comprises a first and 
second primer pair. The first primer pair is able to amplify a template nucleotide 
sequence from a virus to form a PGR product. The second primer pair is able to 
amplify a nucleotide sequence using the PGR product generated by the first primer 
10 pair as a template. The use of nested sets of primer pairs allows increased sensitivity. 

In a preferred embodiment each primer pair is specific for a particular virus 
family, so that it does not detect viruses of other families. 

In the method of the invention the plurality of amino acid sequences or 
nucleotide sequences are provided from different known viruses of the same family. 
15 The amino acid sequences or nucleotide sequences will be for the same protein of the 
different viruses. Typically at least 5, 10, 20, 50, 100 or more sequences are 
provided. The maximum nxmiber of sequences provided will, for example, be 300 
sequences. 

Each of the sequences which is provided is typically at least 20, 50, 100, 200 
20 or more amino acids or nucleotides in length. In general the maximum length of the 
nucleotide sequences is 1000 nucleotides and the maximum length of the amino acid 
sequences is 300 amino acids. The sequences may be obtained from a database of 
sequences, such as GenBank. The sequences may be obtained from a database 
comprising virus sequences which are organised into homologous protein families 
25 (based on sequence similarity relationships). 

In a preferred embodiment the sequences are obtained from the VIDA 
database (described in Alba et al (2001) Nucleic Acids Research 29, 133-136) or the 
Virus Division of GenBank. The sequences may be provided in the fonn of a 
database, preferably in computer-readable form. The sequences are preferably 
30 provided in the form of a computer-readable database constructed using programs 
which identify homologous protein families, such as GeneTableMaker, MKDOM or 
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PSCBuilder. 

The sequences which have been provided are compared to identify conserved 
regions. Typically such conserved regions will have a length of at least 12 
nucleotides, such as at least 15, 21, 27, 36, 99 or more nucleotides (generally up to a 
5 maximum length of 200 nucleotides) or at least 4, 5, 7, 1 0, 25 or more amino acids 
(generally up to a makimum length of 50 amino acids). 

Across the conserved region the virus sequences which are being provided 
will of course share identity or similarity. Typically the amino acids or nucleotides 
in at least 50% of the positions in the region will be the same in at least 50 %, 60%, 

10 70%, or 80% of the viruses of the group (i.e. in the family, genus or subfamily). 

The algorithm which identifies conserved regions generally uses a multiple , 
sequence alignment method. The method may comprise (a) aligning all pairs of 
sequences separately to calculate a distance matrix giving the divergence of each pair 
of sequences, (b) calculating a guide tree from the distance matrix, and (c) aligning 

15 the sequences progressively according to the branching order in the guide tree. 
A preferred algorithm for the aligning the conserved sequences is 
CLUSTALW as described in Thompson et al (1994) Nucleic Acids Research 22, 
4673-80. Other algorithms that can be used for aligning sequences are MultAlin 
(Corpet (1988) Nucleic Acids Research 16, 10881-90) or Jalview (Clamp et al (1998) 

20 http://barton.ebixo.uk). BLOCKS of conserved regions of amino acids may be 

extracted from the multiple alignments, typically using the program Blocks Multiple 
Alignment Processor. Alternatively the entire process of performing multiple 
alignments and extracting BLOCKS can be performed using BLOCKMAKER 
(Henikoff and Henikoff (1994) Genomics 19, 97-107). 

25 The output from the alignment and BLOCK extraction set (i.e. the 

information describing the identified conserved regions) is then entered into the 
algorithm which designs the primers. Such output is typically in the form of partial 
sequences which correspond to the conserved regions (BLOCKS). These BLOCKS 
are input into a primer design algorithm. In one embodiment such an algorithm is 

30 CODEHOP. 

In the primer design step the conserved regions which are chosen as targets 
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for primers preferably comprise few codoiis with degenerate counterparts, i.e.' 
preferably the sequence has a low redundancy, such as a redundancy of less than 512 
fold, 256 fold or 128 fold. Each primer binds in accordance with Watson-Criclc base 
pairing and thus the binding is sequence specific. Each primer will thus be designed 
5 to be wholly or partially complementary to the sequence to which it binds. 

Each of the primers typically has a length of at least 8 nucleotides, such as at 
least 10, 12, 15, 20, 30, 40 or more nucleotides (up to a maximum of 50 nucleotides 
for example). In one embodiment the primer may comprises at least 2, 4 or 6, up to a 
maximum of 10 for example, inosine bases. Inosine is able to bind to any of the four 
10 nucleotides and therefore use of inosine causes a reduction in effective redundancy. 

Each primer pair will be designed so that the PCR product generally has a 
length of at least 20, such as at least 50, 100, 200, 500, 1000 or more nucleotides 
(and typically up to a maximum of 5x10^ nucleotides long). 

Each primer is preferably designed so that it anneals to a single site, i.e. the 
primer will not bind to any other site in the genome of the relevant virus. 

Each primer is preferably designed so that it does not exhibit secondary 
structure, i.e. the nucleotides in the primer will not bind substantially to any other 
nucleotide in the primer apart from those to which it is covalently linked. In addition 
preferably each primer is designed so that it does not bind other primers with the 
same 3equence. 

In one embodiment the 3' region, and preferably the 3' terminal nucleotide of 
the primer binds to the target sequence with high affinity, thus preferably this region 
or nucleotide comprises a G or C. 

Generally each primer is designed to have an annealing temperature of from 
30 to 65° C, such as 50 to 60° C or 35 to 45"* C, In addition each primer pair may be 
designed to ensure that the two primers do not bind to each other. 

The primers are designed by a computer based algorithm. In one embodiment 
such an algorithm designs primers according to the following rules: 

1) A set of blocks is input, where a block is an aligned array of amino acid 
sequence segments without gaps that represents a highly conserved region of 
homologous proteins. A weight is provided for each sequence segment, which can be 
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increased to favour the contribution of selected sequences in designing the primer. A 
codon usage table is chosen for the target genome. 

2) An amino acid position-specific scoriag matrix (PSSM) is computed for 
each block using the odds ratio method, 
5 3) A consensus amino acid residue is selected for each position of the block 

as the highest scoring anaino acid in the matrix. 

4) For each position of the block, the most common codon corresponding to 
the amino acid chosen in step 3 is selected utilizing the user-selected codon usage 
table. This selection is used for the default 5' consensus clamp in step 8. 

10 5) A DNA PSSM is calculated from the amino acid matrix (step 2) and the 

codon usage table. The DNA matrix has three positions for each position of the 
amiQO acid matrix. The score for each amino acid is divided among its codons in 
proportion to their relative weights from the codon usage table, and the scores for 
each of the four different nucleotides are combined in each DNA matrix position. 

15 Nucleotide positions are treated independently when the scores are combined. As an 
option, the highest scoring nucleotide residue from each position can replace the 
most common codons from step 4 that are used m the consensus clamp. 

6) The degeneracy is determined at each position of the DNA matrix based on 
the nimiber of bases found there. As an option, a weight threshold can be specified 

20 such that bases that contribute less than a minimum weight are ignored in 
determining degeneracy. 

7) Possible degenerate core regions are identified by scanning the DNA 
matrix in the 3' to 5* direction. A core region must start on an invariant 3* nucleotide 
position, have length of 1 1 or 12 positions ending on a codon boundary, and have a 

25 maximum degeneracy of 128 (this is the default setting of CODEHOP). The 
degeneracy of a region is the product of the number of possible bases in each 
position. 

8) Candidate degenerate core regions are extended by addition of a 5' 
consensus clamp from step 4 or 5, The length of the clamp is controlled by a melting 

30 point temperature calculation (the CODEHOP default is 60 ''C) and is usually about 
20 nucleotides. 
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9) Steps 7 and 8 are repeated on tKe reverse complement of the DNA matrix 
from step 5 for primers corresponding to the opposite DNA strand, 

in one embodiment CODEHOP (Rose et al (1998) Nucleic Acids Research 
26, 1628-1635)isused to design the primer pairs. This program uses the above 
5 rules. 

The primers designed by the algorithm may then be mapped back to the 
original sequence to choose primer pairs which provide the desired length of PGR 
product. 

The above-described computer based method is repeated until the desired 

10 number of primer pairs have been designed. Optionally the primer pairs can then be 
synthesised and tested. They are typically tested to determine the optimal conditions 
for using the primers in a PGR reaction. 

The PGR reaction is carried out in a PGR mixture that generally comprises 
the following: the template polynucleotide (which will be amplified in the event of 

15 virus detection), one or more primer pairs designed as described above, a polymerase 
enzyme (typically a DNA polymerase, such as Taq polymerase), deoxynucleotide 
triphosphates (dATP, dTTP, dCTP and dGTP) and a suitable buffer. 

The PGR reaction generally comprises cycles of the following steps: a 
denaturation step, a primer annealing step and a polynucleotide synthesis step. 

20 Typically the PGR reaction comprises at least 25 cycles, such as 30, 35, 40 or more 
cycles, up to a maximiom of 60 cycles for example. Generally in the denaturation 
step the PGR mixture is heated to a temperature at which the polynucleotides in the 
PGR mixture (in particular the polynucleotide region to be amplified) denature to 
single stranded form. The denaturing temperature is generally from 85 to 98 °C, 

25 In the primer annealing step the primers bind to template nucleotide sequence 

in a sequence specific manner. This step is generally carried out at a temperature of 
from 30 to 65 °G. Li the polynucleotide synthesis step the polymerase 
replicates/synthesises nucleotide sequence based on template sequence by addition of 
nucleotides to the 3* end of the bound primers. This step is generally carried out at 

30 about 72^G, 

In one embodiment the primers are tested for their ability to amplify one or 
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more of the plurality of nucleotide sequences from known viruses which were used to 
design the primers, or in the case of amino acid sequences from known viruses being 
used to design the primers the primers may be tested for their ability to amplify liie 
nucleotide sequence from the virus which encodes the amino acid sequence. 
5 The primers may be tested in a range of biiffer conditions to determine 

optimal buffer conditions for PGR using the primers. The buffer conditions which 
may be tested include pH (t3^ically between 7 and 10), magnesium concentration 
(typically from 0.5 mM to 5 mM), potassium chloride (typically from 0 to 100 mM), 
ammonium chloride (typically 0 to 100 mM), glycerol (typically 0 to 20%), 

10 dimethysulphoxide (typically 0 to 20%), ethanol (typically 0 to 20%), sorbitol 
(typically 0 to 20%) or betaine (typically IM betaine). 

The primers may be tested at a range of different temperatures to determine 
the optimal temperatures in the PGR reaction. Preferably the primers are tested in 
PGR reaction in which a range of primer annealing temperatures are tested. 

1 5 Typically the range of temperatures is from 30 to 65 ° C . 

The panel of primer pairs or a group of primers within the panel may be 
designed to be used together on the same plate (i.e. using the same thermal cycles). 
Thus such primer pairs will be designed to work at the same annealing temperature. 
In one embodiment a group of primer pairs within the panel are designed to 

20 have similar optimal conditions for use in PGR so that they can be used optimally in 
the same well or reaction vessel, i.e. that they can be used in multiplex PGR. Such a 
. group typically comprises at least 2, 3, 4, 5, 6 or more primer pairs (up to a 
maximum of 8 primer pairs for example). 

To provide such primer pairs the computer based method steps may be used 

25 to design primer pairs which are calculated to have similar annealing temperatures 
and/or the primers are tested to select primer pairs which can be used optimally 
together. Such testing typically determines whether the primers work optimally with 
the same buffers and/or whether the primers have similar annealing temperatures. 
In one embodiment at least one or both primers of each primer pair in the 

30 group carries a label. Typically at least one of the primers in each primer pair will 
carry a different label from that used for the other primer pairs. The PGR product 
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generated by labelled primers carries the labels present on the primers. Thus after the 
group of primers have been used for PGR in the same well detection of the labels in 
the PGR products can be used to deduce which PGR product was formed from each 
primer pair. In one embodiment all forward primers of the group are labelled with 
5 one colour and the reverse primers are labelled with a different colour. 

In a preferred embodiment the primers are labelled with a fluorescent label, 
such as fluorescein based labels (e.g. fluorescein isothiocyanate). DiflFerent primer 
pairs may be labelled with fluorescent labels of different colours. The fluorescent 
labels which axe used may be capable of detection by a Beckman CEQ2000™ or 

10 Applied Biosystems A3700™ fluorescent DNA analyser. The fluorescent labels may 
obtained from Beckman Goulter or Applied Biosystems. 

Another way of being able to determine which PGR products are generated 
by which primer pair is for each primer pair in the group to generate a PGR product 
of different size to the PGR products generated by the other primer pairs of the 

1 5 group. Typically each PGR product which is generated by the group of primers 

differs in size from all the other PGR products by at least 20, such as at least 50, 1 00, 
200, 500, 1000 or more nucleotides. Each PGR product may for example differ in 
size from all other PGR products by up to a maximum of 3000 nucleotides. 
The following Example illustrates the invention: 

20 Examp le 

The Example below refers to Figure 1 which shows how primers were 
designed using a database known as 'VXD A', and computer programs know as 
^GLUSTALW, 'BLOGKMAKER' (or 'BLOGKS') and 'GODEHOP'. 

25 Designing a panel of primers 

A panel of primers was designed for detecting unknown viruses from the 
family Herpesviridae according to the strategy shown in Figure 1 . The amino acid 
sequences of herpes vims DNA packaging protein UL15 were obtained from the 
VIDA database (Alba et al, see above). These sequences are shown in Table 1. 

30 The sequences obtained from the VIDA database were then imported into 

GLUSTALW. This compares the protein sequences to identify conserved regions 
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and then aligns the sequences according to the conserved regions. The alignment 
produced by CLUSTALW is shown in Table 2. 

The BLOCKMAKER program was then used to extract blocks of conserved 
aligned sequences which do not contain gaps from CLUSTALW and enter them into 
5 CODEHOP. The primer sequences were then designed by CODEHOP using the 
conserved sequences. The output from the CODEHOP program is shown in Table 3. 
The 'Complement of Block' sequences shown in Table 3 shows the sequence of the 
other strand allowing primers to be designed for amplification in the opposite 
direction. 



10 
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Table 1 . All protein sequences of DNA packaging protein UL 1 5 extracted from VID A. 
Here written as a list and imaligned. 

>gi_10180719 

5 MFGGLLGEETKRHFERLMKTKNDRLGASHRIJERS IRDGDMVDAPFIiNFAI PVPRRHQTVMPAIGILHNCC 
DSIiGIYSAITTRMLySSIACSKPDEIiRRDSVPRCyPRITWAQAP^ 
NAYYSTMNSFISMRTSDAFKQIiTVFISRFSKLLIASFRDVli^^ 
miliMHATYFVTSVLLGDHAERAERLIiRVAFDTPHFSDIVTRHFRQ^ 

MSSFEGIRIGYTSHIRKAIEPVFEDIGDRIjRRWFGAHRVDH\nCGETITFSFPSGLKSTVTFASSHl^ 
1 0 RGQDEJSTLLFVDEANFIRPDAVQTI IGFIJSTQATCKIIFVSSTN^SGKi^TSFLYCaijKGSADDIjIJS^^ 

EHMKHVTDYTNATS CSCVVLiHKPVFITPIDGftMRRTAEMFLPDSFMQEIIGGGVVDRTI CQC3DRS IFTASA 

IDRFLIYRPSTV^mQDPFSQDLYVYVDPAFTAOTKASGTGVAVIGKYGTDYIVFGLEHYF^ 

SIGYCVAQClilQICAIHRKRFGVIKIAIEGNSNQDSAVAIATRIAIEMISYMKAAVAPTPHNVSFYH^ 

NGTDVEYPYFIjLQRQKTTAFDFFIAQFNSGRVIASQDIiVSTTVSLTTDPVEYLTKQIi^ 
15 RTFSGKKGGeroDTVVAL'IMAVYISAHlPDMAFAPIRV 

>gi_767313 9 ^ 

MFGGLIiGEETKRHFERIjMKTKNDRIiGASHRNERS IRDGDMVDAPFUSTFAI PVPRRHQTVMPAIGILHNCC 
DSIiGI YSAITTRMXiYSS lACSEFDELRRDS VPRCYPRITNAQAFLS PMMMRVANS I IFQEYDEMECAAHR 
NAYYSTMNSFISMRTSDAFKQLTVFISRFSKLLIASFRDVSnOLiDDB^^^ 
20 KMI FDACHLFCNFCFTWRSRRASERIjLRVAFDTPHFSDIVTRHFRQRATV^ 

MSSFEGIRIGYISHIRKAIEPVFED IGDRLRRWFGAHRVDH^^GETITFSFPSGIiKSTVTFASSHNTNS I 
RGQDFJSTLLFVDEANFIRPDAVQTI IGPIiNQATCKI IFVSSTNSGKASTSFLYGLKGSM 
EHMKHVTDYTNATSCSCnnniilSr^ 

IDRFL lYRPSTVITNQDPFSQDIiYVYVDPAFTANTKASGTGVAVI GKYGTDYI VFQLEHYFIiRALTGES SG 
25 SIGYCVAQCIiIQICAIHRKRFGVIKIAIEGNSNQDSAVAIATRIAIEMISYMK2^VAPTPHIWSFYHSKS 
NGTDVEYPYFLLQRQKTTJ^FFIAQFNSGRVIjASQDIjVSTTVS 
RTFSGKKGGNDDTVVAIiTMAVYISAHIPDM2^APIRV 
>gi_5689285 

MFGGALGESAKKHFERIiliRDIOJffiRIiGASRKiracXi^^GGS 
3 0 DGTGIYSAIATRIiLYAGI VSSEFGEVRRESLSNGHISKRNREALLAPTLra ITFHEYDDAQC3UUJR 
NAYYSTMNTFGSMRTSDAFQQLASFIDRFSKLLAASFKDVNIIiDRinS^^ 

KMILMHATYFLTSVIjLEDHAERAERIjLRVIFDIPDFSDAAT^^ 

MSSFEGIRIGYTSHIRKAIEPVFEEIGDRLRRWFGTQCrVDH\nCGETITFSFPSGSRSTVTFASSH^^ 

RGQDFIOjLFVDEANFIRPDAVQTI IGPiaTQANCKI IFVSSTNSGKASTSFL 
3 5 EHMK^VTNYTimTSCSCYVtiEIKPVFITO^ 

VERLl^YRPSTy^QDILSRDLYVYVDPAPTAimiASGTGIAVIGRYGADYIIFGL^ 

AIGECAAQCIAQI CAIHCERFGTIRVAVEGNSKQDSAVAIATRIS IDLAS YVQSGVAPAPHDVCFYHSKP 

AGSNVEYPFFLLQRQKTAAFDFFIARFNSGRVIjASQDIiVSTTISLSTDPVEYIiTK 

RTFSGKKGGYDDTVVAIiVMAVYISAHASDATFAPIRGVEATCKGPTEA 
40 >gi_1869837 

MFGQQlASDVQQYIjERIiEKQRQQKVGVDEASAGIiTLGGDAliRVPFLDFATATPKRHQTWPGVGl^^ 

EHSPLFSAVARIUljLFNSIiVPAQIiRGRDFGGDHTAKIjEFLAPELVRAVARI^^ 

Iiim'QALHRSEAFRQLVHFVRDFAQrjr^ 

HATYFIiAAVLLGDH^^QWrFLRLVFEI PLPSDTAVRHFRQRATVFIiVPRRHGKTWFLVPLI ALSLAS PR 
45 GIKIGYTAHIRKATEPVFDEIDACLRGWFGSSRVDH^nKGETISFSFPDGSRSTIVFASSHimTGIR^ 
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NLLFVDEJOTIRPDAVOTIMGFUSrQANCKI IFVSSTEaTGKZ^TSFLYNliRG2iADEIiIJ^^ 

tATTHTNATACSCYIIil^PVFITmGAViyRTADLFLPDSFMQEIIG^ 

SIiAQVLAltoG2VFRSVRVAVECasTSSQDSAVAIATHVHTEMHRIIi2^ 

PFFIiIj1SIKQKTPAFEYFIKKFNSGGVMASQEI.VSVTV^ PNTDVRMYSGKR 

NGAADDLMVAVIMAIYLAAPTGIPPAFPPITRTS 

>cri_59501 

MFGQQLl^DVQQYljERIiEKQRQLKVGADEASAGLmGGDAIiRVPFIiDFATA 
EHSPIiFSAVARKLIiFNSIjVPAQiiKGRDFGGDHTAKIiEFIiAPELVR^ 
IiNTFQALHRSEAFRQnVHFVRDFAQLLKTSFRASSLTOTTGPPKKRAKTO 
HATYFIiAAVLLGDHAEQVOTFLRIjVFEIPLFSDAAVRHFRQRATVF^^ 

GIKIGYTAHIRKATEPVFEEIDACLRGWFGSARVDHVKGETISPSFPDGSRSTIOTASSHNmGIRGQDP 
imjliFVDEACTFIRPDAVQTIMGErjSrQANCKI IFVSSTNTGjECASTSFLYNIiRGaADELLN^ 

PSTTTNSGLMAPDLYVYVDPAFTAimiASGTGVAWGRYRDDYIIFAI^ 
SLTQVIiAIjHPGAFRGVRmVEGNSSQDSAVAIATHVinEMHRI^^ 

YPFFLIiNKQKTPAFEHFIKKFNSGGVMT^ PNTDVRTYSGK 
RNGASDDLMVAVIMAIYLAAQAGPPHTFAPITRVS 

>gi_26 05992 

MFGKALSIlETIQYFETLRKEVQSRSGAiainiAAEAQTGGEDDVKTAFLNFAIP 

CETAQIFASVAIOUjLFRSIiSKWRGGESKERLDPSSVEAYVDPKVKQALKTIS 

I^INTFDSLRSSDAFHQVANFVARFSRIiVDTSFNGADLDGDGQOTSKRIKVD 

^IHATYFIAAVILGDHADRIG2mlKMVFNTPEFSDATIRHFRQRATVFIlVPRRHGK^^ 

KGIKIGYTAHIRKATEPVFDEIGARJbRQVfFGNSPVDHVKGENISFSFPDGSKSTIVFASSHirai^^ 

FIOiLFVDEAITFIRPEAVQTIIGFIJSrQTNCKIIFVSSTKrrGKASTSFL 

RViCAHraATSCSCYIIJHKPVFIT^^DGAM^^ 

YRPSTVJUiTQDIMSlSnsriiYVYVDPAFrTNiM^ 

AQCn^AKVFAIHSRPFDSVRIAVEGNSSQDAAVAIATNIQLEIiNTIjRQADVVHMPG 

YPFFIiLQKQKTGAFDHFIKAFNSaLVIiZV^QELISimTRLQTDPTO 

RWGASDDMLVALVimVYMASIiPPTTNAPSSLSTQ 

>gi_33 0792 

MFGRVLGRETVQYFEAIiRREVQARRGAKmi2U^QNGGEDDAKTAFIiN^ 

CETAQIPASVARRLLFRSLSKWQSGEARERIjDPASVEAYVDPKVRQAEiKTISFVEyS 

IMOTFDALRSSDAFHQVajSFVAIlPSRLVDTSFNGADLDGDGQffi 

MHATYFIAAVILGDHADRIG2lFLK^fVFOTPEFSDATIRHFRQRATWLV 

KGIKIGYTAHIRKATEPVFDEIGARLRQWFGNSPVDHVKGENISFSFPDGSKSTIVFASSHl^^ 
FNUjFVDEANFIRPEAVQTI IGFtiHTQTWCKI IPVSS'IWTGKaSTSFLYNIiKGA?U3DriIjEn^^ 
RVKAHXNATACSCYIIiNKPVFITMDGAMROTAEIjFIjPDSFMQ 

YRPSTVMQDIMSSDLYVYVDPAFTTNAMASGTGVAWGRYRSNWVVFGMEHFFIiSAIiTGSSAELIAROT 

AQCIiAQVFAIHKRPFDSVRVAVEGNSSQDAAVAIATNIQLEIOTLRRADW 

YPFFLLQKQKTGAFDHFIKAFWSGSVLASQELVSNTVRLQTO 

RNGASDDMLVALVMAVYIiSSIiPPTSDAFSSIiPAQ 

>gi_971317 

MFGGAVGEQSARYFQRIiIiRERQRRAAERGARPDGGGGARGEDDARVPFIiDFAVAAPKRHQTW 
GYCSniAPLFAATASRIiIJliTSMARAEAGrjsrrGTGEftJ^ 

AIlES^!RASG^^AQVA2VPVARFSRIJVGTSFSHIlGGGDDADPPRAKRARVEPPSGQTRG^ 
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TYFVAATIiGEHAERIGZVFIiRVAFNTPDPSDAAVAHFRQ^ 

KIGYTi^IRKATEPVFEEIVARtiRQWFGGERVDHVKGEVISFSFPDGARSTIWASSHimTGIRGQDF^ 
LFVDEANFIRPEAVQT I VGFIaNQASCKI I FVS STOTGKASTSFLYimKGASDGIiliNV^^ I CNEHTPRVA 
AHGGATACSCnfVXiNKPVFITMDAlUJiEOTAET IGGGEVARRAEPAAVFTRAAGEQFIiLYRP 
5 STAAARGPWPERLYMyiDPAFTSNARASGSGIAWGRHRGSWLVLGLEHFFIj^ 

FAQVMAVHRRRIiDGLFVAVEGlSrSSQDSAVAIAIiGVRREIjDSIjAASGAVPMPAETRF!^^ 
FLIiQKQICrAAFDHFIRLFNSGRWASQDIiASLTVRLQTDPVEYLFEQIjQl^ 
AADDLMVAIiVMAVFVGSIiPPTDGAFCPLAPRPPAD 
>ff±_5a698 08 

1 0 MSLIMFGRTLGEESVRYFERIiKRiaiDERFGTIjESPTPCSTRQGSLC^ 

IGTLHKCCEYIPXiFSATARRAMFGAFIiSSTGYNCTPNVVLKPWRYSVNANVSPELKKAVSSVQFYEY 

EAAPHRNAYSGVMOTFRAFSLSDSFCQIiSTFTQRFSYIiVETSFESIEECGSHGKRAKVDVPI 

ELFQKMIIi^ErTHFISSVLIiGDHADRVDCFLRTVFOT 

lALVMATFRGIKVGYTAHIRKATEPVFEGIKSRIiEQWFGMSfYVDHV^ 
1 5 TNGIRGQDFNIiLFVDEAlTFIRPDAVQTI VGFUSfQTNCKI IFVSSTNTGKASTSFLYXfTLRGSSDQIjIiNVVT 

YVCDDHMPRVIiAHSDVTACSGYVIJJTKPVFITMDGAMRRTADLE^^ 

TKTARERFILYRPSTVAWCAILSSVIiYVYVDPAFTSimil^GTGVAIVGRYKSDW 

TSSSEIGRCOTQCLGHILAIiHPJSn'FTISnraVSIEG^ 

HS I PPGCSVAYPFFLIiQKQKTPAVDYFVKRFNSGNI lASQELVSLTVKLGVDPVEYLCKQIiDNLTEVIKG 
20 GMGNLDTKTYTGKGTTGTMSDDLMVAIilMS VYIGSSCI PDSVFMPIK 
>gi_5708110 

MLGKESVEI VKRYRDALRKRTMERQPDDVDGQEMSDSNFITTAS I CDKE^ 

QRHQACIAPIGSFHNCCAISRAFS YMASEI I YEWLASYSTKYTDTDAAXJTO IL 
PAIiRQKLAIJIiNFARFAPSDSIilHDKAFDGIMNGYRGFV^ 
25 RAKLEKTTSEQRDGTLELFQKMILMHATYFASS laUSEGSTERSNRYLSTVF^ IQHFRQRTT 
VFLVPRRHGKTWFIiVPIiISIiIiVSSFEGIRIGYTAHLRKATEPVFIEIFT^ 
TFRNGNKSAIVFASSQimTGIjRGQDFlSrFX.FVDEA]5rFIKPAALOT 
LLYNLKGKTNSIilJSTVVTYICDEjHMPEIQKRTDVTTC^ 

GGRAGKYDSDRTLVPViyiajDQFLIYRPSTSSKPNISGLGKIIiTVYVDPAFTTI^ 
3 0 MVIiMGAEHFYIiDALTGEAAiiEIAQCVr^CIAYCCLIHAGAFREIRIA 

IiRRRLGFSIiTFAHSRQPGTAMAHPFYIiLNKQKSRAFDriFVSLFNSGRFiyK^ 

DQIRNITVTHGQGPDSFRTFSGKQGRVPDDMIiVAAVMSTYIiAIiEGSPTAGYHPIAPI 

>gi_X813970 

MLRGDSAAKIQERYAELQKRKSHPTSCISTAFTNVATLCRTO 
3 5 RDYNS PEESQRELLFHERLKSALDKLTFRPCSEEQRASYQKIjDAIjTELYRDPQPQQllSnS^ 
GFSTAVEGDAKAIRIiEPFQKiyn^IHVIFFIAVTKIPVIaANR^ 
LVPRRHGKTWFIIPIISFLLKHMIGISlGYVAHQKHVSQFVLKEVEFRCatHTF 
RGAKSTALFASCYETOTSIRGQNFHIiTtTiVDEAHFIKKEAFirriliGFIAQNOT 

RLNNAPFDMI.NVVS YVCEEHLHS FTEKGDATACPCYRI^KKPTFI SLNSQTOKTANMFMPGAFMDE 1 1 GGT 
40 NKISQOTVLITDQSREEFDILRYSmmTAYDYFGKl^ 

GLEHFFLRDLSESSEVAIAECSAHMIISVIiSIiHPYLDELRIAVEGl^ 
VLFYHTPDQNHIEQPFSnuMGRDKALAVEQFISRFNSGYIKASQELVSCT 
TIiAEGTTARYSAKRGNRISDDIjIIAVIMATYIiCDDIHAIRFRVS 
>ga_2745296 

45 MIiRSCDIDAIQKAYQS 1 IWKHEQDVKISSTFPNSAIFCQKRFI ILTPELGFTHAYCRHVKPLYLFCa^RQR 
HVKSKIAIODPIiNCmiSKLKFTAIIEKETTEVQYQKHLiEL 
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NKKERIKLEPFQKS ILIHI IFFISVTKLPTLAimTIiDYLKyKFDIEFINESSTO 
KIWMIPVICFliLKlSniiEGI S IGYVAHQKHVSHFVMKDVEFKC^ 
FASCYimiS IRGQSFNLLI VDESHFIKKDAFSTILGFIiPQSSTKI IFISSTO 
MLTWSYVCEDHVHIIiNDRGNATTa^CYRIiHKPKFISINJ^ 
5 LITEQGLIEFDLFRYSTISKQIIPFLGKELYIYIDPAYTIJmRASGTGVAAIGTYGDQYIIYCa^ 

SUliSNSDAS lAECASHMlIiAVIiEliHPFFTELKI I lEGNSNQSS AVKI ACILKQTI S VIRYKHITFFHTI»D 
QSQIAQPFYIiLGREKRIiAVEYFlSNFJJTSGYIKASQELISFTIKITYDFIEYVrEQIK^^ 
NAKKQTCSDDIiIiIS I IMAIYMCHEGKQTSFKEI 
>gi_^325496 

1 0 MLRTCDITHIKEnmAIIWKGERDCSTISTKYPNSAIFYKKKFim 

RHLKrmKPIiTILPSIiSHKLQEMKFLPASDKSFESQYTEFLESFKIIiYREPLFL 
JSroFGDTRKIQLEPFQKNILIHVIFFIAVTKLPALAimVIJRY^ 
RHGKTWFIVPIlSFLLKNIEQrSIGYVAHQKHVSHFVMKEVEFKCRRMFPEKTI^ 
TALFASGYNTHQSIRGQSFlSttjIilVDESHFIKKDAFSTILGFIiPQA^ 

1 5 SPFEMLSWSYVCEDHAEIMIjNERGNATACSCYRIjHKPKPISINAEVKKTAI^ 
IlTDVLITEQGQTEFEFFRYSTINKCJDIPFIiGKDI.YVYIiDPAYT<a^ 

YPIiESIiMTSSDTAIAEC&AHMILS ILDIiHPFPTEVKII lECajTSlSTQASAVKrACIIKENITAIIKS IQVTFP 
HTPDQNQIAQPFYIiLGKEKiaijAVEFFISimTSGNIKASQEIilSFTIKITYDPVEYAI^ 
YITYSAKKQACSDDIil lAI IMAIYVCSGISrSS2^FREI 
20 >gi_854 039 

MKLNNSPFEMLSWSYVCEDHAHMtiS^ IlSAEVKKTAlSrLFIiEGAFIHEIMGG 
ATOTVIiroVLITEQGQTEFEFFRYSTINKNIil PFLGKDIiYVYIjDPAYTGNRRASGTGI AAIGTYLDQYI V 
YGMEHYPLESLMTSSDTAIAECAAHMIIjS ILDIiHPFFTEVKI I lEGNSNQASAVKI ACI I KENITANKS I 
QVTFFHTPDQNQIAQPFYLIiGKEKKIjAVEFFISNFN^SGNIKASQELIS FTI KITY13PVE IRNIHQ 
25 ISVNNYITYSAKKQACSDDIiIIAIIMAIYVCSCasrSSASFREI 
>gi_5733564 

MZjRTCDITHIKNNYEAIIWKGERNCSTISTKYPNSAIFYKKRFIMIiTPELGFAHSYlSr^^ 
RHLKNRKPLTILiPSLTRiOiQEMiaFXPASDKSFESQYTEFLESFKILYREPIi^ 
NDFGDTRKIQIiEPFQKNIIiIHVIFFIAVTKLPAIiANRVINYIiTHV^ 
3 0 RHGKTWFI VPI ISFUjKNIEGIS IGYVAHQKHVSHFVMKEVEFKOOiMPPEKTITaCjDNVITIDHQNIKS 
TAIiFASC'JOTHSIRGQSFNriLIVDESHFIKKDAFaTILGF^ 
■ PFEMIJS WSYVCEDHAHMIiNERGNATACSCYRliHKPKPIS I 

NDVLITEQGQTEFEFFRYSTINKISILI PFLGKDliYVYliDPAYTGNRRASGTGI AAIGTYLDQYI VYQIEHY 
FLESIJXrrSSDTAIAEauUmiljSILDLHPFFTEVKIIIEGNSNQ^ 
35 TPDQNQIAQPFYIiLGKEKKIAVEFFISNFNSGNIKASQEIiISFTIKITYDPVEYAIjEQIRNIHQISV^^ 
ITYSAKKQACSDDIiI lAI IMAIYVCSGNSSASFREI 
>gi_4 995048 

MKLEilNS PFEMLS WS YVGEDHZ^HMLNERGfiTATACS CYIOiHKPKF I S INAEVKKTAlffLFLEGAF I HE IMGG 
ATCNViroVLITEQGQTEFEFPRYSTINKlSrLI PFLGKDLYVYLDPAY^ 
40 YGMEHYFLESLMTSSDTAIAECAAHMIIiS ILDLHPFFTEVKI IIEGNSNQASAVKIACI IKENITAITKSI 
QVTFFHTPDQNQIAQPFYIjLGKEKKLAVEFFISNFNSGITIK2iSQEXiISFTIKITY^ 
ISVNNYITYSAKKQACSDDIiIIAIIMAIYVCSGNSSASFREI 
>gi_113 6a08 

^!IlLSRHRERIiAAmiEETAKDAGERWELSAPTFTRHCPKTARMAHPFIGV^^ 
45 TPTSANPDVGTPRPSEDOTPAKPRIiIiESIjSTYIiQMRCVRSaDAHVSTMQIiVEY?^ 
EIiQAFLVICTiSSFIiNGCOTPGVHWIiEPPQQQLVMHTFFFLVSIK^ 
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TFKQKASVFIilPRRHGKTWIWAIISMLIASVENINIGYVaHQK^^ 

KENGTI lYTRPGGRSSSLMCATCFNKNS IRGQTFNIiLYVDEAITFIKiaDALPAILGFMIiQKDAKIiIFISSV 
NSSDRSTSFIiIiKLRNAQEKimiTWSyVCADH^ IKTTTNIiFME 
GAFDTELMGEGAi^SlSrATLYRWGDAALTQFDMCRVDTTAQEVQKCLGKQLF^ 
5 GAVVTSTQTPTRSLIIiGMEHFFLRDLTGAAAYEIASCACmiKA^^ 

lATVIjNEICPLPIHFLHYTDKSSALQWPIYMLGGEKSSAFETFIYAIiNSGT^ 
TYLVEQVRAIKCVPIiRDGGQSYSAKQKHMSDDIOijVAVVJ^^ • • 

>gi_1718281 

MLQKDAKlilFISSVNSSDRSTSEljIiNIiPN^ 
10 ' IDESIKTTTmFMEGAFDTELMGEGAASSNATLYRWGDAA^ 

PAYTNm^SGTGVGAVVTSTQTPTRSLlLGMEHFFLRDIiTGAAAYEI^ 

AAVEGNSSQDSGVAIATVIJ^ICPLPIHFLHraDKSSAXiQWPIYMLGGE 

TWSOTIKISFDPVryiiVEQVRAIKCn^IiRDGGQSySAKQKH^ 

Q 

15 >gi_2246515 

MI4QKDAKLIFI SSVNSSDRSTSFLlJSnLiRNAQEKMIjl^^ PTYIT 
IDESIKTTTNLFMEGAFqTELMGEGAASSNAmYRWGDAALTQ 
PAYTNNTEASGTGVQAVVTSTQTFIMLILGMEHFFIjRDLTGi^ 
AAVEGNSSQDSGVAIATVIJSIEICPLPIHFLHYTDKSSALQWPimEjGGEKS^^ 
20 TWSNTIKISFDPVTYIiVEQVRAIKCVPIjRDGGQSYSAKQK^ 
Q 

>gi_2246552 

^^^SRHRERIlAANl.QETAKDAGER!^LSAPTFTRHCPKTARMAHPFIGVV^ 
TPTSANPDVGTPRPSEDNVPAKPRIiLESLSTYLQMRCVREDAHVSTADQLVEyQAARKTm 
25 ELQAFLVIJLSSFIjNGCWPGVHWIjEPFQQQIiVMH^ 

TPKQKASVFLIPRRHGKTWIVVAI ISMliIiASVENINIGYVAH^ IKTLCRWFPPKNLNIK 
KENGTI I YTRPGGRSSSUVrcATCFCTKlTS IRGQTF^!^IlLYVDEANFIKKDALPAILGF^ILQKDAICLIFISS V 
NSSDRSTSFLIiNIiRNAQEKMIiNWSWCADHR^ 

GAFDTELMCffiGA2^SNATLYRWGDAAliTQFDMaRVDTTAQQVQKCLGKQL 
3 0 GAVVTSTQTPTRSLII.GMEHFFriRDl.TGAAAYEIASC:aCTMIKAIAA^^ 

lATVLNEICPIiPIHFLHYTDKSSALQWPIYMIiGGEKSSAFETFIYAIiNSGTLSASQTW 
TYIiVEQVRAIKCVPLRDGGQSYSAKQKHMSDDIjLVAVVMAHFm^TDD^^ PQ 
>gi_4494933 

MLQKDAKIiIFISSSNSSDKSTSFIilJiniKDAHEKMr^^ PAYIT 
3 5 IDETVRSTTI^FLEGAFSTELMGDAATSAQSMHKIVSDSSLSQIjDI*^ 
AYTlsnsrrDASGTGIGAVlAViraKVIKCIliLGVEHFFLR^ 
AVEGWSSQDAGVAIATVIOTICSVPLSFIiHHVDKimilRSPI'Ym 
WSHTIKLSFDPVAYLIDQIKAIRCIPLKDGGHTYO^QKTMSDDVIiVAAVMAHYm 
>gi_733 0 018 

40 MI,QKDAKLIFISSSNSSDKSTSFIilJ<ILKDAHEKMIJ^^ 

IDETVRSTTNIiFIiEGAPSTEIiMGDAATSAQSMHKIVSDSSLSQIiDLCK 
AYT3Snm)2^GTGIGAVIAVNHKVIKCIIiLGVEHFFtiRDLTC 

AVEGNS SQDAGVAI ATVLNEI CS VPIjSFLHHADKNTIiIRS PI YMLGPEKAKAFES F I YAliNSGTFS ASQT 
WSHTIIOiSFDPVAYIiIDQIKAIRCIPIjKDGGHTYCaKQKTMSDDVIjV'^ . 
45 >gi__4019255 

MljLLKAKKALMElsmTEASSTQSETEWr^ 
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IiALKQPLPQTGTLRIJliPSEKPYISQKDSimrKSLTLKHVraiDIE 
PI imiSSFIiNGCTVKKSTHIEPFQLQIiIIOTFYFrilS IKSPEST^^ 

KAS IFIiIPRRHGKTWI WAX ISrttilTSVElSrLHVGYVAHQKHVJmSW IJSTTLQKWFPSKNIDVKKENG 

TI I YKI PGKKPSTIjMCASCFNKNS IRGQTFNIjIiYIDEANFIKKDSLPAILGFMLQKDAKL I FISS VNSGD 
5 KATSFLFEHljKNASEKMIjNIVlTyiCPDHKDDFSLQDSLISCPC^ 

TEIiMCTISVMSKimiHKVIGETAIjMQFDLCRIDTTKPElTQCm IMYLYIDPAYTNNSEASGTGIGAI I 

ALKNNS SKCI I VGIEHYFLKDLTGTATYQI AS CACSLIRAAIjVIiYPHIQAVHVAVEGNSSQDS AVAI STF 

LNECSPVKVNFMHYKDKTTAMQWPIYMLGSEKSQAF^ I ISNTIKLTFDPISYLI 

EQIRAIRCYPljRDGSHTYCa^KKRTVSDDVIjVAVVMAHFF 
10 >gi_4019257 

^mQKDAKLIPI SS VNSGDKATSFIjFNLKNASEKMLNI VNYI CPDHTO 

IDETIKlJrraaiFIiDGAFTTEIiMGDISWSK^ 
' PAYTKnsrSEASGTGIGAIIALKlSnffSSKCIIVGIEHYFljKDLTC 

VAT^EGNSSQDSAVAISTFIiNECSPVKVNFMHYKDKTT^ 
15 SI ISlSPTIKLTFDPISYLIEQIRAIRCYPIiRDGSHTYCAKKRTVSDDVLVAVV^ 

I 

>gi_60355 

MLLLKAKKAI lENliSEVSSTQAETDWDMSTPTI ITNTSKSERTAYSKIGVI PSVNLYSSTLTSFCKLYHP 
LTLNQTQPQTGTIiRliLPHEKPLILQDLSimrKIjLTSQNVCHDTEAm^YNAAVQTQ 
20 FVIHLSSFIiNGCYVKRSTHIEPFQIiQIjILHTFYFIjIS IKS PESTNRIiFDI FKEYFGLREMDPDMLQIFKQ 
KAS IFLI PRRHGKTWIWAI ISMLLTSVENIHVGYVAHQKHVANSVFTEI INTLQKWFPSRYIDIKKENQ 
TI lYKSPDKKPSTIiMCATCFNKNS IRGQTFNIjIiYIDEANFIKKDSIiPAIl.GFW!LQKDAKLIFISSVNSGD 
RATS FliFNLKNASEKMIiNI VNYI CPDHKDDFSLQDSIilS CPCYKLYI PTO 
TELMGDMSGISKSNMHKVISEMAITQFDIiCRADTTKPEITQCmSTI^ 

TFKfflNSSKCI IVGMEHYFIiKDLTGTATYQIASO^CSIiIRASIiVIiYPHIQCVHVAVECTfS SQDSAVAISTL 
IHECSPIKVYFIHYKDKTTTMQWPIYmiQAEKSIAFESFIYAINSGTlSASQSIISOTIKLSPD 
EQIRS IRCYPIiRDGSHTYCAKKRTVSDDVLVAVVMAYFFATSlSra 
>gi__595201 

MLQKDAKI IFISS VNSSDQTTSFLY2JIIjKN2VKEKimmrC^ 
IDENIIOSTTBrDFMEGAFTTELMGDGAAATTQTNMHKW 

DPAYTlSrr<n?EASGTGMGAWSMKNSDRCVWGVEHFFIjKELTGA^ SIiQIAS CAAAIiIRSLATIiHPFVREAH 
VAIEGNSSQDSAVAIATLLHERSPLPVKFIJmADKATGVQWPMYILGAEia^RAra 
AIVSNTIKtiSFDPVaYIiIEQIRAIKCYPIiKDGTVSYCAKHKGGflDDTriVAVVMAHYFATSDRHVF^ 

QI 

>gi_4 923 934 

MliliSSFRimLQKNYEKYSVQAQNIDWPVETPVIilSKDSKTim^ 
TKQPKFTPDIGYVRDLBaCHDQYFIiPKIiQHHLSTriCEAYiniVDRQAQT^ 
FLIITLSCFIiNGCYVSKSTCIEIjFQKQIiILHTFYFIiISIKTPEETNKMFI^ 

KSTVFLIPRRHGKTWIWAIISVIJ^VENVHIGYVAHQKHVANA 

TI I YTKPGRKPSTLMCATCFNKNS IRGQTFNIIiYVDEAOTIKKEAIiPAIIiGFMLQKDAKI IFISSVNSAD 
KSTSFllFNIlRNAKEK^!LaST\^715r^ 

TEMGDISTFPTSSMFKVVEEQALFHFDIGRVDTTQIDTVKIIDNVLYVYTO 
PLKTKOTTIILGIEHFYIiKNLTGTASQQIAYCVTSMIKAIIiTLHPHINHyiWA^ 
NEYCPVPVFFAHCNERSSWQWPIYILGSEKSGAFEKPICAIiPrrGTI.SASQT^ 
QIRAIRaUPXjKDGSYTyCAKQKTMSDDTriVAVVMANYm.ISEKHTFK^ 
>gi_lS32798 
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WttiYASQRGRIiTEHLRNALQQDSTTQGCLGa^ 

GPRPVFVASDESIiHlFGASPAIimPVQVQSMCIjIiPEIjRDTLQRIiLPPI*:^^ 
DPNFIiEMREFVTSIiASFLSGQYKtnCPARLEAFQKQVV^ 
IiEKIJIIFKQKASVFIiIPiaiHGKXWIWAIISIjILSNIiSNVQIGW^ 
5 RVEVPTECETSTITFRHSGKISSTVMCSlTCPNKNS^ I 
FISSVNSADQATSFLYKLIODAQERIiLNWSYVCQEHRQDFDMQDSMVSCPCFRtiHIPS 
NliFLDGAFSTELMGDTSSLSQGSLSRTVRDDAINQIiELCRVDTIiNP^^ 
SGTGIAAVTHDRADPNRVIVLGLEHFFLKDLTGDAAIiQIATCOTiUljVSSIV^^ 
DSAVAlASIIGESCPIiPCAFVHTiaDKTSSLQWPMYliIiTNEKSKAFERIi 
1 0 SFDPVLYLISQIRAIKPIPriRDGTYTYTQKQRNlJSDDVIiVAIiVR^^ 
>gi_2337991 

MFYVKVMPALQKACEELQNQWSAKSGKWPVPETPIiVAVETRRSERWPHPlOiGIiIiPGVAAY 
YNPYIDAliTRODLGQTHRRVATQPVIjSDQIiCQQIjKKLFSCPRlSrrSVKAK^ 
IiKTFVIiNIjSAFIJiTKRYSDRSSHIEIiFQKQLIMHTFFFIiVSIKAPELC^ 
1 5 FKQKAS VFIil PRRHGKTWI WAI I S ILIaAS VQDLRI GYVAHQIOIVliJSrAVFTEVIimiHTFFPGKY^ 

ENGTI IFGLPNKKPSTLLCATCFNKNS IRCSQTFQIiLFVDEANFIKKDAIiPTILGFMLQKDAKI IFISSSN 
SSDQSTSFIlY^rLKGASERMIilWVSYVCS]S^HKEDFSMQDGIiISCPCYSI^ 

VFDTELMGDSSC6TLSTFQI ISESAliSQFELCRIDTAS PQVQAEIiNSTVHMYIDPAFTNNLDASGTGl S V 
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Table 3. Degenerate primers generated by CODEHOP 



Block x7-263xhliD 

TLYVYIDP 
oiigo : 5 ' -AACCTGTACGTGtayntiigaycc-3 ' 

TLY VYIDPA 
oligo : 5 ' -AACCTGTACGTGTACntiagayccngc-3 ' 



TLYVYIDPAY 
oligo : 5 ' -AACCTGTACGTGTACATngayccugcnt-3 ' degen=128 temp-42 . 5 

Complement of Block x72 63KbliD 



ciegen*'64 temp-33.4 Extend clamp 
degen=12a temp=3 6.0 Extend clamp 

Extend clamp . 



YI-DPAYTNNT. 
atrnanctrggGCGGATGTGGTTGTTGT 
degen=64 temp-62 . 9 . 

DPA Y TNNTRA 
anctrggncgnaTGTGGTTGTTGTGGGTCCG 
degen==128 temp=61-8 

DPAYTNNTRA 
ctrggncgnawGTGGTTGTTGTGGGTCCG 
degen—64 temp=61 . 0 



oligo : 5 ' -TGTTGTTGGTGTAGGCGggrtcnanrta-3 ' 



oligo : 5 ' -GCCTGGGTGTTGTTGGTGTangcnggrtcna-3 ' 



oligo : 5 ' -GCCTGGGTGTTGTTGGTGwangcnggrtc-3 ' 



Block x72 63xbliE 

CIIFGMEHFF 
oligo : 5 ' -TGGATCATCTTCGGCATngarcaytwyt-3 ' degen=64 temp«55 . 7 

I FGMEHF-F'L 
oligo : 5 ' -CATCTTCGGCATGGAGcaytwytwyyt-3 ' degen=64 . temp=62 . 0 

Complement of Block x72 63xbliE • • 

E -H F F ' L R- D L T G . ' • 

ctygtrawrawGGACTTCCTGGACTGCCC 
degen=='32 temp'=61.7 • • 

HFFLRDLTG 
tygtrawrawrrACTTCCTGGACTGCCCG 
dejgen-'12a temp«60.8 



Extend clamp 



' oligo : 5 ' -CCCGTCAGGTCCTTCAGGwarwartgytc-3 



oligo : 5 ' -GCCCGTCAGGTCCTTCArrwarwartgyt-3 ' 



HFFLRDL TG- 

gtrawrawrraCTTCCTGGACTGCCCG oligo : 3 ' -GCCCGTCAGGTCCTTCarrwarwartg-3 ' degen=»64 
temp— 60 .8 * » . 

Block x7263xbliF 

E V H I A V- E G N 
oligo : 5 * -GGACGTGCACGTCGCCrtngarggnaa-3 ' degen=64 temp='63 . 8 • , 



Complement of Block x7263xbliF 

EGN.S'SQD.SA 
anctyccnttrwGGTTGGTCCTGAGGCGG 
degen«128 temp-62.7 



oligo : 5 ' -'GGCGGAGTCCTGGTTGGwrttnccytcna-3 ' 



EGNSSQDSAV 

ctyccnttrwsGTTGGTCCXGAGGCGGC oligo : 5 ' -CGGCGGAGTCCTGGTTGswrttnccytc-3 ' 
degen«64 temp=63 .9 . ' * ' " ' 
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CLAIMS 

1 . A method of designing a panel of degenerate primer pairs for screening for new 
members of multiple known virus families in a biological sample, wherein, each 
primer pair in the panel binds a sequence that is conserved across members of a said 

5 virus family and selectively directs amplification of sequence of said family by PGR, 
which method comprises 

(a) providing a plurahty of amino acid sequences firom members of a first virus 
family, 

10 

(b) comparing the sequences to identify conserved regions, 

(c) designing a first primer pair using a computer based method, wherein each primer 
in the pair binds a nucleotide sequence that encodes a conserved region identified in 

1 5 (b) and wherein the primer pair is designed to amplify, by PGR the nucleotide 
sequence between the nucleotide, sequences that encode conserved regions in 
members of the first virus family, and , . 

(d) repeating steps (a) to (c) for each vims family. 
20 , 

2. A method of designing a panel of degenerate primer pairs for screening for new 
members of multiple known virus families in a biological sample, wherein each * 
primer pair in the panel binds a sequence that is conserved across members of a said 
virus family and selectively directs ampHfication of sequence of said family by PGR, 

25 which method comprises . 

(a) providing a plurality of nucleotide sequences firom members of a first virus 
family, 

30 (b) comparing the sequences to identify conserved regions. 
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(c) designing a first primer pair using a computer based method, wherein each primer 
in the pair binds a conserved region identified in (b) and wherein the primer pair is 
designed to amplify by PGR the nucleotide sequence between the conserved regions 
in members of the first virus family, and 

5 

(d) repeating steps (a) to (c) for each virus family. 

3. A method according to claim 1 or 2 which further comprises synthesising one or 
more of the primer pairs and deteriiiining optimal conditions for usiag the primer 

10 pairs in PGR, 

4. A method according to any one of the preceding clainis wMch comprises testing 
the ability of one or more of the primer pairs to amplify a nucleotide sequence that 
encodes an amino acid as defined in claim 1(a) or a nucleotide sequence as defined in 

15 claim 2(a), 

5. A method according to claim 3 or 4 which comprises testing the primer pair(s). in a ' 
range of buffer conditions to detemune the optimal buffer conditions for PGR: 

20 6. A method according to any one of claims 3 to 5 which comprises testing the 
primer pair(s) at a range of different temperatures to determine the optimal 
temperature for PGR. . . " • 

7. A method according to any one of the preceding claims which comprises 
25 identifying one or more groups of primer pairs wherein the primer pairs ia each 
group have similar optimal conditions of use in PGR such that they can be used • 
optimally in the same reaction vessel. 

8; A method according to claim 7 wherein, each primer pair in a group generates a 
3 0 PGR product of a different size to the other primer pair(s) in the group. 
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- 9. A method according to claim 7 or 8 wHerein each priraer pair in a group carries a 
different label from the other primer pair(s) in the group. 

1 0- A method according to claim 9 wherein each primer pair in a group carries a 
5 differently-coloured flourescent label. 

11 . A method according to any one of the preceding claims wherein the biological 
sample is a single-source sample from a single individual or is a pooled sample from 
more.than one individual of the same species. 

10 . 

12. A method according to claim 1 1 wherein the biological sample is a human 
sample. 

13. A method according to any one of the preceding claims wherein at least 50% of' 
'1 5 the primer pairs bind a sequence that is conserved across all of the genuses and/or . 

subfamilies. 

14. Apanel of primers designed according to any one of the preceding claims. 
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